NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The JIBO Kids Corpus: A speech dataset of child-robot interactions in a classroom environment

https://doi.org/10.1121/10.0034195

Shankar, Natarajan_Balaji; Afshan, Amber; Johnson, Alexander; Mahapatra, Aurosweta; Martin, Alejandra; Ni, Haolun; Park, Hae_Won; Perez, Marlen_Quintero; Yeung, Gary; Bailey, Alison; et al (November 2024, JASA Express Letters)

This paper describes an original dataset of children's speech, collected through the use of JIBO, a social robot. The dataset encompasses recordings from 110 children, aged 4–7 years old, who participated in a letter and digit identification task and extended oral discourse tasks requiring explanation skills, totaling 21 h of session data. Spanning a 2-year collection period, this dataset contains a longitudinal component with a subset of participants returning for repeat recordings. The dataset, with session recordings and transcriptions, is publicly available, providing researchers with a valuable resource to advance investigations into child language development.
more » « less
Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition

https://doi.org/10.1016/j.specom.2021.08.002

Yeung, Gary; Fan, Ruchao; Alwan, Abeer (December 2021, Speech Communication)

Full Text Available
Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition

https://doi.org/10.1109/ICASSP39728.2021.9413801

Yeung, Gary; Fan, Ruchao; Alwan, Abeer (June 2021, ICASSP 2021)
null (Ed.)
Full Text Available
Analysis of Disfluency in Children’s Speech

https://doi.org/10.21437/Interspeech.2020-3037

Tran, Trang; Tinkler, Morgan; Yeung, Gary; Alwan, Abeer; Ostendorf, Mari (October 2020, Interspeech 2021)
null (Ed.)
Full Text Available
A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of f0 in Vowel Perception

https://doi.org/10.21437/Interspeech.2019-1847

Yeung, Gary; Alwan, Abeer (September 2019, Interspeech 2019)

Full Text Available
On the Difficulties of Automatic Speech Recognition for Kindergarten-Aged Children

https://doi.org/10.21437/Interspeech.2018-2297

Yeung, Gary; Alwan, Abeer (September 2018, Interspeech 2018)

Automatic speech recognition (ASR) systems for children have lagged behind in performance when compared to adult ASR. The exact problems and evaluation methods for child ASR have not yet been fully investigated. Recent work from the robotics community suggests that ASR for kindergarten speech is especially difficult, even though this age group may benefit most from voice-based educational and diagnostic tools. Our study focused on ASR performance for specific grade levels (K-10) using a word identification task. Grade-specific ASR systems were evaluated, with particular attention placed on the evaluation of kindergarten-aged children (5-6 years old). Experiments included investigation of grade-specific interactions with triphone models using feature space maximum likelihood linear regression (fMLLR), vocal tract length normalization (VTLN), and subglottal resonance (SGR) normalization. Our results indicate that kindergarten ASR performs dramatically worse than even 1st grade ASR, likely due to large speech variability at that age. As such, ASR systems may require targeted evaluations on kindergarten speech rather than being evaluated under the guise of “child ASR.” Additionally, results show that systems trained in matched conditions on kindergarten speech may be less suitable than mismatched-grade training with 1st grade speech. Finally, we analyzed the phonetic errors made by the kindergarten ASR.
more » « less
Full Text Available
Target and Non-target Speaker Discrimination by Humans and Machines

https://doi.org/10.1109/ICASSP.2019.8683362

Park, Soo Jin; Afshan, Amber; Kreiman, Jody; Yeung, Gary; Alwan, Abeer (May 2019, IEEE ICASSP 2019)

The manner in which acoustic features contribute to perceiving speaker identity remains unclear. In an attempt to better understand speaker perception, we investigated human and machine speaker discrimination with utterances shorter than 2 seconds. Sixty-five listeners performed a same vs. different task. Machine performance was estimated with i-vector/PLDA-based automatic speaker verification systems, one using mel-frequency cepstral coefficients (MFCCs) and the other using voice quality features (VQual2) inspired by a psychoacoustic model of voice quality. Machine performance was measured in terms of the detection and log-likelihood-ratio cost functions. Humans showed higher confidence for correct target decisions compared to correct non-target decisions, suggesting that they rely on different features and/or decision making strategies when identifying a single speaker compared to when distinguishing between speakers. For non-target trials, responses were highly correlated between humans and the VQual2-based system, especially when speakers were perceptually marked. Fusing human responses with an MFCC-based system improved performance over human-only or MFCC-only results, while fusing with the VQual2-based system did not. The study is a step towards understanding human speaker discrimination strategies and suggests that automatic systems might be able to supplement human decisions especially when speakers are marked.
more » « less
Full Text Available
Subglottal resonances of American English speaking children

https://doi.org/10.1121/1.5082289

Yeung, Gary; Lulich, Steven M.; Guo, Jinxi; Sommers, Mitchell S.; Alwan, Abeer (December 2018, The Journal of the Acoustical Society of America)

Full Text Available
Towards the development of personalized learning companion robots for early speech and language assessment

https://doi.org/10.302/1431402

Yeung, Gary; Afshan, Amber; Quintero, Marlen; Martin, Alejandra; Spaulding, Samuel; Park, Hae Won; Bailey, Alison; Breazeal, Cynthia; and Alwan, Abeer. (April 2019, 2019 Annual Meeting of the American Educational Research Association (AERA))

This pilot study investigated the feasibility of implementing child-friendly robots for administering clinical and educational assessments with young children. JIBO, a social robot, was used as a new interface to administer a letter and number naming task and the 3rd Goldman Fristoe Test of Articulation (GFTA-3). The reason for using these assessment materials is to develop robust automatic speech recognition (ASR) and automated social interaction systems that can aid in administering such assessments more efficiently. The voice of JIBO simulates interaction with a peer, and images and playful transitions are displayed on JIBO’s face/screen. Several preliminary observations with 15 pre-kindergarten and 18 kindergarten students included the rate of task completion and strategies to increase student participation. Changes to the length and prompt delivery of the assessment protocol were considered based on these observations, and further observations are planned for future work with an additional cohort of 43 prekindergarten and 50 kindergarten students. Recommendations are given to inform future implementations and analyses.
more » « less
Full Text Available
Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles

https://doi.org/10.1121/1.5045323

Park, Soo Jin; Yeung, Gary; Vesselinova, Neda; Kreiman, Jody; Keating, Patricia A.; Alwan, Abeer (July 2018, The Journal of the Acoustical Society of America)

Search for: All records